-
Notifications
You must be signed in to change notification settings - Fork 10.1k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Option to split during conversion #6942
Conversation
26ebf83
to
874c341
Compare
I've added support for The counterpoint I can see to doing this is that |
This is already a good start. Could you add an end to end usage in the summary? |
Sure thing (I assume you mean examples of usage and expected outputs). I also plan to rework the implementation by consolidating code into a new |
I'll need to implement for Anyway, |
|
Got it - will only implement for |
You can modify the gguf package in the |
That's what I've been doing so far; will check out instructions to contribute, thanks! |
Testing on Mistral 7B Instruct, this branch's |
Running tests on my side for all |
Will keep track of tests here as I go. Picking one model from each architecture in It also seems like the current
|
Leaving a note for myself to watch merge conflicts with #6511. Development on this branch has slowed down as I'm pretty busy. |
Noting time to convert baichuan-inc/Baichuan2-7B-Chat. New branch, New branch, no split: master: Note that these conversions were done writing the outfile over 2.5GbE, so there was considerable time spent just saving the file. Will test more later, but it doesn't seem like the change increases conversion time too significantly. |
Merge attempted. Some ambiguous lines, so @christianazinn should give this a lookover to make sure the intent is still correct. |
I'll check in a few hours and fix conflicts. |
The new |
Co-authored-by: compilade <[email protected]>
Co-authored-by: compilade <[email protected]>
Co-authored-by: compilade <[email protected]>
Co-authored-by: compilade <[email protected]>
Co-authored-by: compilade <[email protected]>
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm satisfied with how this turned out. I did not test this extensively, but from the conversions I tried (with --split-max-size
and with no split, both with q8_0
and f16
), this worked well.
A future PR to add split model support to GGUFReader
would be nice.
Co-authored-by: compilade <[email protected]>
Co-authored-by: compilade <[email protected]>
Co-authored-by: compilade <[email protected]>
Forgot to mark as ready for review. Can probably be merged. |
few days has passed with the merge ready label, ci passed and approval. Consensus achieved? I'll presume it will be so by the end of the week. |
It's been about a week and I see no dissent so far. |
Co-authored-by: compilade <[email protected]>
* support splits in convert.py * Support split by size and dry run to write estimated shards/filesizes * Move split functionality to new GGUFManager class * fix improper function signature * tentative push of convert-hf-to-gguf support * resolve merge + SplitArguments for easier parsing * Fix eager tensor memory leak and remove convert.py changes Removed a memory leak caused by unexpected reference retention to eager tensors. Also removed GGUFManager functionality in convert.py in favor of specializing for convert-hf-to-gguf.py. * refactor SplitStrategy to be a deque Instead of having SplitStrategy have a `data` field that is a deque, just have SplitStrategy be a subclass of deque itself. * fix Q8 quantization * remove unnecessary imports in gguf_manager * fix final? merge issue * fix gguf_writer placement and remove comments * oops, actually fix gguf_writer placement * reduce duplicated code from gguf_writer * further simplify GGUFManager * simplify even further and standardize with GGUFWriter * reduce diffs with master * form shards while adding tensors, SHA256 sums agree with master * re-add type hint Co-authored-by: compilade <[email protected]> * GGUFWriter compatibility fix Co-authored-by: compilade <[email protected]> * Shard dataclass and un-negative dont_add_architecture * type consistency in format_n_bytes_to_str * move kv keys to constants.py * make pathlib explicit * base-1024 bytes to base-1000 * rename GGUFManager to GGUFWriterSplit * Update gguf-py/gguf/constants.py Co-authored-by: compilade <[email protected]> * fix convert-hf-to-gguf.py permissions * fix line endings * Update gguf-py/gguf/gguf_writer_split.py Co-authored-by: compilade <[email protected]> * convert-hf : restore executable file permission * examples/convert-legacy-llama.py: restore executable file permission * reinstate original gguf package import and fix type annotation * attempt to appease the linter * attempt 2 to appease the linter * attempt 3 to appease the linter * comma consistency * Update convert-hf-to-gguf.py Co-authored-by: compilade <[email protected]> * edit cmd line args * use simplification from ggerganov#7827 * kv/ti data are still wrong * try to refactor kv data (still fails) * fix ti data messiness * tidy up * fix linting * actually make the linter happy * cleanup round 1 * remove SplitStrategy, SplitArguments * appease linter * fix typing and clean up * fix linting * Update gguf-py/gguf/gguf_writer.py Co-authored-by: compilade <[email protected]> * progress bar, fix split logic * Update gguf-py/gguf/gguf_writer.py Co-authored-by: compilade <[email protected]> * catch oversights * Update gguf-py/gguf/gguf_writer.py Co-authored-by: compilade <[email protected]> * Update gguf-py/gguf/gguf_writer.py Co-authored-by: compilade <[email protected]> * Update gguf-py/gguf/gguf_writer.py Co-authored-by: compilade <[email protected]> * Update gguf-py/gguf/gguf_writer.py Co-authored-by: compilade <[email protected]> * Update gguf-py/gguf/gguf_writer.py Co-authored-by: compilade <[email protected]> * swap bar orders * Update gguf-py/gguf/gguf_writer.py Co-authored-by: compilade <[email protected]> * Update gguf-py/gguf/gguf_writer.py Co-authored-by: compilade <[email protected]> * compatibility fix * Update gguf-py/gguf/gguf_writer.py Co-authored-by: compilade <[email protected]> * Update convert-hf-to-gguf.py Co-authored-by: compilade <[email protected]> --------- Co-authored-by: Brian <[email protected]> Co-authored-by: compilade <[email protected]>
* support splits in convert.py * Support split by size and dry run to write estimated shards/filesizes * Move split functionality to new GGUFManager class * fix improper function signature * tentative push of convert-hf-to-gguf support * resolve merge + SplitArguments for easier parsing * Fix eager tensor memory leak and remove convert.py changes Removed a memory leak caused by unexpected reference retention to eager tensors. Also removed GGUFManager functionality in convert.py in favor of specializing for convert-hf-to-gguf.py. * refactor SplitStrategy to be a deque Instead of having SplitStrategy have a `data` field that is a deque, just have SplitStrategy be a subclass of deque itself. * fix Q8 quantization * remove unnecessary imports in gguf_manager * fix final? merge issue * fix gguf_writer placement and remove comments * oops, actually fix gguf_writer placement * reduce duplicated code from gguf_writer * further simplify GGUFManager * simplify even further and standardize with GGUFWriter * reduce diffs with master * form shards while adding tensors, SHA256 sums agree with master * re-add type hint Co-authored-by: compilade <[email protected]> * GGUFWriter compatibility fix Co-authored-by: compilade <[email protected]> * Shard dataclass and un-negative dont_add_architecture * type consistency in format_n_bytes_to_str * move kv keys to constants.py * make pathlib explicit * base-1024 bytes to base-1000 * rename GGUFManager to GGUFWriterSplit * Update gguf-py/gguf/constants.py Co-authored-by: compilade <[email protected]> * fix convert-hf-to-gguf.py permissions * fix line endings * Update gguf-py/gguf/gguf_writer_split.py Co-authored-by: compilade <[email protected]> * convert-hf : restore executable file permission * examples/convert-legacy-llama.py: restore executable file permission * reinstate original gguf package import and fix type annotation * attempt to appease the linter * attempt 2 to appease the linter * attempt 3 to appease the linter * comma consistency * Update convert-hf-to-gguf.py Co-authored-by: compilade <[email protected]> * edit cmd line args * use simplification from ggerganov#7827 * kv/ti data are still wrong * try to refactor kv data (still fails) * fix ti data messiness * tidy up * fix linting * actually make the linter happy * cleanup round 1 * remove SplitStrategy, SplitArguments * appease linter * fix typing and clean up * fix linting * Update gguf-py/gguf/gguf_writer.py Co-authored-by: compilade <[email protected]> * progress bar, fix split logic * Update gguf-py/gguf/gguf_writer.py Co-authored-by: compilade <[email protected]> * catch oversights * Update gguf-py/gguf/gguf_writer.py Co-authored-by: compilade <[email protected]> * Update gguf-py/gguf/gguf_writer.py Co-authored-by: compilade <[email protected]> * Update gguf-py/gguf/gguf_writer.py Co-authored-by: compilade <[email protected]> * Update gguf-py/gguf/gguf_writer.py Co-authored-by: compilade <[email protected]> * Update gguf-py/gguf/gguf_writer.py Co-authored-by: compilade <[email protected]> * swap bar orders * Update gguf-py/gguf/gguf_writer.py Co-authored-by: compilade <[email protected]> * Update gguf-py/gguf/gguf_writer.py Co-authored-by: compilade <[email protected]> * compatibility fix * Update gguf-py/gguf/gguf_writer.py Co-authored-by: compilade <[email protected]> * Update convert-hf-to-gguf.py Co-authored-by: compilade <[email protected]> --------- Co-authored-by: Brian <[email protected]> Co-authored-by: compilade <[email protected]>
This PR introduces additional options to
convert.py
that allow users to split a model into shards while converting rather than having to do it after conversion, including a default small first shard as outlined in #6463.Other functionality we ought to have includes
--split-max-size
(so far it's just--split-max-tensors
), displaying estimated shard sizes, dry running, and adding sharding for the otherconvert-*-to-*.py
scripts. This will be considered a draft until those are worked out. Also needs considerable testing, but luckily as this deals with the Python scripts, it can be tested easily.Usage
(examples are using zephyr-smol_llama-100m-sft-full)
Example,
--split-max-size
python3 convert.py --outfile /path/to/outfile.gguf --outtype f16 /path/to/safetensors --split --split-max-size 64M
Output: equal to what's printed to stdout from
master
, thenWith
--split-max-size 200M
(or any number greater than the total resultant size), it gives:Example,
--split-max-tensors
with--dry-run
,--large-first-shard
python3 convert.py --outfile /path/to/outfile.gguf --outtype f16 /path/to/safetensors --split --split-max-tensors 20 --dry-run --large-first-shard
Output: equal to what's printed to stdout from
master
, thenWith
--split-max-tensors 64
(or any number greater than the total tensor count), it gives:References
gguf-split
add a default option to not include tensors data in first shard #6463